Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy

نویسندگان

  • Jonathan Krause
  • Varun Gulshan
  • Ehsan Rahimy
  • Peter Karth
  • Kasumi Widner
  • Gregory S. Corrado
  • Lily Peng
  • Dale R. Webster
چکیده

PURPOSE Use adjudication to quantify errors in diabetic retinopathy (DR) grading based on individual graders and majority decision, and to train an improved automated algorithm for DR grading. DESIGN Retrospective analysis. PARTICIPANTS Retinal fundus images from DR screening programs. METHODS Images were each graded by the algorithm, U.S. board-certified ophthalmologists, and retinal specialists. The adjudicated consensus of the retinal specialists served as the reference standard. MAIN OUTCOME MEASURES For agreement between different graders as well as between the graders and the algorithm, we measured the (quadratic-weighted) kappa score. To compare the performance of different forms of manual grading and the algorithm for various DR severity cutoffs (e.g., mild or worse DR, moderate or worse DR), we measured area under the curve (AUC), sensitivity, and specificity. RESULTS Of the 193 discrepancies between adjudication by retinal specialists and majority decision of ophthalmologists, the most common were missing microaneurysm (MAs) (36%), artifacts (20%), and misclassified hemorrhages (16%). Relative to the reference standard, the kappa for individual retinal specialists, ophthalmologists, and algorithm ranged from 0.82 to 0.91, 0.80 to 0.84, and 0.84, respectively. For moderate or worse DR, the majority decision of ophthalmologists had a sensitivity of 0.838 and specificity of 0.981. The algorithm had a sensitivity of 0.971, specificity of 0.923, and AUC of 0.986. For mild or worse DR, the algorithm had a sensitivity of 0.970, specificity of 0.917, and AUC of 0.986. By using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, the algorithm improved in AUC from 0.934 to 0.986 for moderate or worse DR. CONCLUSIONS Adjudication reduces the errors in DR grading. A small set of adjudicated DR grades allows substantial improvements in algorithm performance. The resulting algorithm's performance was on par with that of individual U.S. Board-Certified ophthalmologists and retinal specialists.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Networks with Manifold Learning for Diabetic Retinopathy Detection

Widespread surveillance programs using remote retinal imaging has proven to decrease the risk from diabetic retinopathy, the leading cause of blindness in the US. However, this process still requires manual verification of image quality and grading of images for level of disease by a trained human grader and will continue to be limited by the lack of such scarce resources. Computer-aided diagno...

متن کامل

Evaluating machine learning methods and satellite images to estimate combined climatic indices

The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...

متن کامل

مقایسه قدرت پیش بینی شبکه عصبی مصنوعی با رگرسیون لجستیک چندگانه در تفکیک بیماران دیابتی رتینوپاتی از غیر رتینوپاتی

 Background: Diabetes mellitus is a high prevalent disease among the population, and if not controlled, it causes complications and irreparable damage to the eye and cause blindness. This study goal is to investigate the predictive power of multiple logistic regression model and the Artificial Neural Network Multi-layer Perceptron (MLP) in determining patients with and without diabetic...

متن کامل

Telemedical retinopathy of prematurity diagnosis: accuracy, reliability, and image quality.

OBJECTIVE To prospectively measure accuracy, reliability, and image quality of telemedical retinopathy of prematurity (ROP) diagnosis. METHODS Two-hundred forty-eight eyes from 67 consecutive infants underwent wide-angle retinal imaging by a trained neonatal nurse at 31 to 33 weeks' and/or 35 to 37 weeks' postmenstrual age (PMA) using a standard protocol. Data were uploaded to a Web-based tel...

متن کامل

An Efficient Integrated Approach for the Detection of Exudates and Diabetic Maculopathy in Colour fundus Images

Diabetic Retinopathy (DR) is a major cause of blindness. Exudates are one of the primary signs of diabetic retinopathy which is a main cause of blindness that could be prevented with an early screening process In this approach, the process and knowledge of digital image processing to diagnose exudates from images of retina is applied. An automated method to detect and localize the presence of e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Ophthalmology

دوره   شماره 

صفحات  -

تاریخ انتشار 2018